Skip to content

chore(rivetkit): wasm support#4860

Draft
NathanFlurry wants to merge 42 commits into04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizationsfrom
04-29-chore_rivetkit_wasm_support
Draft

chore(rivetkit): wasm support#4860
NathanFlurry wants to merge 42 commits into04-29-feat_sqlite_add_cold_read_benchmarks_and_simplify_optimizationsfrom
04-29-chore_rivetkit_wasm_support

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 30, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 2 times, most recently from c2da5e3 to cc67013 Compare April 30, 2026 20:24
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Review: chore(rivetkit): wasm support

This is a substantial Phase 1 PR that adds remote SQLite execution (v4 envoy protocol) and a WASM runtime path (rivetkit-wasm). The architecture is well-designed — the split into native-transport/wasm-transport features, the typed ProtocolCompatibilityError hierarchy, and the RAII inflight guard pattern are all solid. A few issues below.


Bugs / Correctness

WasmActorConfig has dead fields that are silently dropped

rivetkit-typescript/packages/rivetkit-wasm/src/lib.rs defines create_state_timeout_ms, on_create_timeout_ms, on_wake_timeout_ms, on_before_actor_start_timeout_ms, and on_request_timeout_ms in WasmActorConfig, but the From<WasmActorConfig> for ActorConfigInput conversion doesn't map them (they don't exist in ActorConfigInput). Users who set these will get silently ignored config. Either remove them from WasmActorConfig or, if they should be supported, add them to ActorConfigInput with corresponding NAPI exposure.

remote_sqlite_inflight_counter is incremented before the semaphore is acquired

In spawn_tracked_remote_sqlite_task (ws_to_tunnel_task.rs), the inflight counter increments at track_remote_sqlite_inflight before worker_permits.acquire_owned().await. If all 32 worker slots are full, the counter will be non-zero while the task is queued waiting for a permit — so actor_stopped will block for the full remote_sqlite_stop_timeout even if the actual SQL work hasn't started. This is the safer direction (avoid closing the DB while tasks are pending), but worth documenting explicitly since it can make actor stop slower than expected under load.

actor_stopped spawned unconditionally without structured cleanup on drain timeout

The caller in ToRivetEvents now unconditionally spawns actor_stopped and drops the handle. If the SQLite drain times out (the ensure! in actor_lifecycle.rs), the error is logged but the active_actors entry is never removed and the executor cell leaks for the lifetime of the connection. The previous in-line call also swallowed errors with warn!, but consider whether the executor/inflight cleanup should still run on drain timeout to avoid accumulated state.


Convention Violations

Inline #[cfg(test)] mod tests in rivetkit-sqlite-types/src/lib.rs

CLAUDE.md: Rust tests live under tests/, not inline #[cfg(test)] mod tests in src/.
The two tests (execute_result_preserves_result_and_route_metadata and execute_result_projects_query_and_exec_results) should move to rivetkit-rust/packages/rivetkit-sqlite-types/tests/.

Missing doc comment on ActorConfig::remote_sqlite

The adjacent has_database field has a full explanation of what it controls and why. remote_sqlite has none. Add a one-liner explaining it selects SqliteBackend::RemoteEnvoy vs LocalNative.


Performance / Design Notes

handle_message clones Arc<Conn> for every message

The signature change from &Conn to Arc<Conn> adds an atomic refcount inc/dec per parsed message, even for the majority of message types that never need the Arc (all existing SQLite page I/O paths just reborrow with &conn). Consider keeping the parameter as &Arc<Conn> and only cloning in the ActorStateStopped and remote SQL dispatch branches.

Global 32-worker limit per connection, not per actor

REMOTE_SQLITE_WORKER_LIMIT = 32 is per-envoy-connection. A single high-traffic actor can saturate the pool. This is a reasonable starting point, but worth a TODO comment flagging per-actor limiting as a follow-up (the spec already notes this gap).


What's well done

  • Typed ProtocolCompatibilityError (versioned.rs): replacing bail!("sqlite requests require envoy-protocol v2") with a structured error carrying feature, direction, required_version, and target_version is a clear improvement. The compat tests in remote_sql_compat.rs cover all four cross-version directions cleanly.
  • compile_error! mutual-exclusion guards in connection/mod.rs: enforcing exactly one transport feature at compile time is the right approach.
  • RAII RemoteSqliteInflightGuard: incrementing on construction and decrementing on Drop is correct and doesn't require the spawned task to manage cleanup explicitly.
  • OnceCell per (actor_id, generation) executor: lazy init on first SQL request, reused for the generation lifetime, torn down on actor stop — exactly what the spec describes.
  • runtime-import-guard.test.ts: enforcing that raw NAPI/WASM binding imports stay behind the two adapter files is an excellent lightweight architectural lint.
  • rivetkit-sqlite-types extraction: removing the duplicated stub enums (BindParam, ColumnValue, ExecuteResult) under #[cfg(not(feature = "sqlite"))] in favor of a shared crate is a clean improvement.
  • open_database_from_engine generation fence check: the ensure! on generation equality before registering the VFS is the correct guard against stale executor reuse.
  • validate_remote_sqlite_request trust boundary: the namespace + actor state + generation checks are well-placed at the envoy edge per the trust boundary conventions.

Minor

  • scripts/ralph/prd.json, scripts/ralph/progress.txt, scripts/ralph/.last-branch, and the archived copies under scripts/ralph/archive/ look like agent working state included accidentally.
  • .mcp.json deletion is unrelated to the feature; confirm it's intentional.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from cc67013 to 716eee0 Compare April 30, 2026 20:29
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Code Review: WASM Support + Remote SQLite Execution

This PR adds WASM support for rivetkit-core and introduces remote SQLite execution over the envoy protocol (v4). The architecture is well-structured and the invariants are clearly documented in .agent/specs/rivetkit-core-wasm-support.md. Overall this is a solid implementation — feedback below is mostly medium/minor.


Protocol versioning (engine/sdks/rust/envoy-protocol/src/versioned.rs)

_ => Ok(()) on owned enum variants — CLAUDE.md violation

ensure_to_envoy_v1_compatible and ensure_to_rivet_v1_compatible (and their v2/v3 variants) use _ => Ok(()) as a catch-all on v4::ToEnvoy / v4::ToRivet. Per CLAUDE.md, wildcard fall-through on owned enum types means a future protocol message variant could silently pass a compatibility gate without a compile-time reminder to update the exclusion list.

// ensure_to_envoy_v1_compatible line 128
_ => Ok(()),  // new v5 message types would slip through here

Recommendation: enumerate every variant explicitly in each ensure_*_compatible function, or at minimum document that the _ => Ok(()) arm is deliberately inclusive and describe the invariant that makes it safe (e.g. "all v4 messages not listed above existed in v1").

_ => bail!(...) on v1::ToEnvoy in convert_to_envoy_v1_to_v2

Same rule. v1:: is a published, frozen schema so new variants won't appear in practice, but the same lint applies. _ => unreachable!() would at least make the assertion explicit.


Error classification (rivetkit-rust/packages/rivetkit-core/src/actor/sqlite.rs)

remote_sqlite_error_response uses string matching to distinguish "unavailable" from "execution failed":

fn remote_sqlite_error_response(message: String) -> anyhow::Error {
    if message.contains("unavailable") || message.contains("unsupported") {
        return SqliteRuntimeError::RemoteUnavailable { reason: message }.build();
    }
    SqliteRuntimeError::RemoteExecutionFailed { message }.build()
}

This is fragile — changing the error message wording in the envoy would silently reclassify errors. Since the protocol already has typed union responses (SqliteExecResponse = SqliteExecOk | SqliteFenceMismatch | SqliteErrorResponse), consider wrapping the error earlier in the response-to-anyhow conversion with a typed sentinel (similar to how ProtocolCompatibilityError is used for the unavailable-version case) so callers can downcast_ref deterministically instead of scanning strings.


Message serialization panic (engine/sdks/rust/envoy-client/src/connection/mod.rs)

let encoded = crate::protocol::versioned::ToRivet::wrap_latest(message)
    .serialize(protocol::PROTOCOL_VERSION)
    .expect("failed to encode message");  // panics instead of returning error

The .expect here would take down the entire envoy connection on an unexpected serialization failure. A tracing::error! + early return (or propagating with ?) would be more graceful.


runtime.rs duplicated #[cfg] blocks

native-runtime and the "neither runtime feature" fallback have identical bodies for RuntimeBoxFuture, RuntimeFuture, RuntimeFutureOutput, and RuntimeSpawner::spawn. The duplication compiles correctly but creates maintenance surface. A single #[cfg(not(feature = "wasm-runtime"))] guard would cover both without repeating:

#[cfg(not(feature = "wasm-runtime"))]
pub type RuntimeBoxFuture<T> = Pin<Box<dyn Future<Output = T> + Send>>;

Minor: commented-out code in test support

engine/packages/pegboard-envoy/tests/support/ws_to_tunnel_task.rs has dead code left over from development:

// use scc::HashMap;
// let authorized_tunnel_routes = HashMap::new();
// let authorized_tunnel_routes = HashMap::new();

These should be removed.


What looks good

  • Protocol compatibility gating: ProtocolCompatibilityError { feature: RemoteSqliteExecution, required_version: 4 } is a clean, typed mechanism; rivetkit-core correctly maps it to SqliteRuntimeError::RemoteUnavailable with downcast_ref.
  • Input validation: 1 MiB SQL limit, 1024 param limit, and 1 MiB bind-param limit in ws_to_tunnel_task.rs are applied before the request reaches the executor.
  • Indeterminate result semantics: fail_sent_remote_sqlite_requests_with_indeterminate_result correctly separates "unsent, safe to retry" from "sent, outcome unknown" on WS disconnect.
  • AsyncCounter: wait_zero correctly arms the Notify permit before the counter check, avoiding the race described in CLAUDE.md ("arm the notification before re-checking").
  • Lock choices: parking_lot::Mutex in actor/sqlite.rs has justifying comments ("Forced-sync: native SQLite handles are read from synchronous diagnostic accessors..."). scc::HashMap is used correctly for the actor map in ws_to_tunnel_task.rs.
  • Feature gating: wasm-runtime / sqlite-local gates are consistently applied and connection/mod.rs uses compile_error! to catch mutually exclusive feature combinations at build time.
  • Test coverage: remote_execution_parity.rs and rivetkit-core/tests/sqlite.rs cover protocol conversion, error mapping, and backend selection across feature combinations.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

PR Review: chore(rivetkit): wasm support

Scope: 24,392 additions / 2,640 deletions across ~180 files. This adds WebAssembly runtime support for RivetKit edge platforms (Cloudflare Workers, Supabase Edge Functions, Deno Deploy), a new envoy-protocol v4 with remote SQLite execution, WASM transport for envoy-client, and a crate::time abstraction layer.


Significant Issues

1. serialize_version no longer enforces the target version — backward-compat regression

In engine/sdks/rust/envoy-protocol/src/versioned.rs, serialize_version(self, _version: u16) now ignores the _version parameter entirely for all three protocol message types. The old code used the version to downgrade v3 messages to v1/v2 wire format and apply compatibility checks (ensure_to_envoy_v1_compatible, etc.). The new code serializes each Self::Vx variant using its own schema regardless of the requested version, meaning callers that pass conn.protocol_version will silently produce a v4-schema binary even when the peer only speaks v3.

The convert_to_envoy_v4_to_v3 etc. vectors are defined but never wired to the _version parameter — they are dead code.

2. actor_stopped spawned in a detached task creates an untracked error window

In ws_to_tunnel_task.rs, actor_lifecycle::actor_stopped is now spawned as an untracked tokio::spawn. If the task panics (e.g., internal unwrap), the actor entry remains in active_actors forever, blocking future remote SQLite validation for that actor. Track the task in the same JoinSet pattern used elsewhere or document why fire-and-forget is safe here.

3. validate_remote_sqlite_active_generation silently permits Stopping actors without comment

actor_lifecycle::ActiveActorState::Stopping => {}

This allows SQL execution on a stopping actor. While the in-flight counter drain in actor_stopped should prevent a close race, the intent is non-obvious. Per CLAUDE.md: non-obvious behavior requires a comment explaining the invariant.

4. convert_same_bytes uses raw serde_bare for cross-version conversion

The convert_same_bytes<T, U> helper round-trips through serde_bare::to_vec + from_slice. CLAUDE.md requires vbare (versioned BARE) not raw serde_bare for wire/persisted formats. Even as a transient migration encoding, this needs an explicit comment justifying why the version header is omitted here.

5. select_sqlite_backend returns LocalNative when enabled = false

fn select_sqlite_backend(enabled: bool, remote_sqlite: bool) -> SqliteBackend {
    if enabled && remote_sqlite { return SqliteBackend::RemoteEnvoy; }
    #[cfg(feature = "sqlite-local")]
    { SqliteBackend::LocalNative }  // ← returned even when enabled = false
    ...
}

When enabled = false and remote_sqlite = false, the function returns LocalNative, causing non-database actors to attempt to open a real database if the feature flag is on. The Unavailable variant should be returned when enabled = false.

6. not_global silently ignored on WASM

start_envoy_sync unconditionally skips the not_global branch on wasm32 targets. Per the fail-by-default rule, if not_global: false is meaningless on WASM, assert it or return an explicit configuration error instead of silently ignoring it.


Security Issues

7. Namespace validation accepts both UUID and name — expands attack surface

ensure!(
    namespace_id == conn.namespace_id.to_string() || namespace_id == conn.namespace_name.as_str(),
    "remote sqlite namespace does not match envoy connection"
);

Accepting either the UUID or human-readable name allows a request to pass if the name collides with another namespace's UUID. Validate only against the canonical namespace_id UUID unless name-based matching is needed for compatibility (in which case, document why).

8. MAX_REMOTE_SQL_BIND_BYTES check with saturating_add continues past the limit

total_bytes = total_bytes.saturating_add(remote_sqlite_param_bytes(param));
ensure!(total_bytes <= MAX_REMOTE_SQL_BIND_BYTES, ...);

After saturating_add saturates at u64::MAX, subsequent iterations keep iterating and produce a misleading error. Use checked_add with an early return for true fail-fast semantics.


Code Quality Issues

9. anyhow! macro used where .context() is preferred

CLAUDE.md requires preferring .context() over the anyhow! macro. Multiple new files use anyhow::anyhow!(...) for simple string errors:

  • envoy-client/src/handle.rs: anyhow!("envoy channel closed"), anyhow!("remote sqlite response channel closed")
  • envoy-client/src/sqlite.rs: anyhow!("remote sqlite request timed out")
  • envoy-client/src/connection/wasm.rs: various anyhow!(...) calls
  • rivetkit-wasm/src/lib.rs: many anyhow!(...) calls in the WASM actor adapter

10. timeout on WASM may panic or silently hang

tokio::time::timeout requires a background time wheel. The new crate::time::timeout abstraction wraps it, but it's unclear whether a WASM-compatible implementation is provided at all call sites (e.g., the actor/task.rs shutdown drain timeout). If not guaranteed within a LocalSet, this will panic or hang on WASM targets.

11. PROTOCOL_MK2_VERSION / PROTOCOL_VERSION not bumped together

CLAUDE.md: "When bumping the protocol version, update PROTOCOL_MK2_VERSION in engine/packages/runner-protocol/src/lib.rs and PROTOCOL_VERSION in rivetkit-typescript/packages/engine-runner/src/mod.ts together." The PR bumps envoy-protocol to v4 but these constants appear unchanged. Please confirm whether they need a parallel bump or explicitly document why they are exempt.


Test Coverage Gaps

12. No test for namespace mismatch rejection in validate_remote_sqlite_request

The primary authorization gate for remote SQLite has no test asserting that a mismatched namespace_id is rejected. Given the security concern in item 7, this test is essential.

13. all_remote_sql_request_variants_require_v4 asserts hardcoded version 3

assert_compatibility_error(err, ProtocolCompatibilityDirection::ToRivet, 3);

This assertion hardcodes target_version: 3 regardless of which version is being tested in the loop (1..4). Pass the loop variable so future version-range changes are caught.

14. Missing test for BuildingServerless waiter race in WASM smoke test

wasm-host-smoke.test.ts doesn't cover handleServerlessRequest called before register() completes — specifically the BuildingServerless state machine race exercised by serverless_runtime() in rivetkit-wasm/src/lib.rs.


Style / Convention Issues

15. Comment style violations (dash separators)

CLAUDE.md: "Avoid fragmented structures with parentheticals and dashes." Multiple new comments use // Foo - bar patterns:

  • conn.rs: // Forced-sync: debounce bookkeeping...
  • ws_to_tunnel_task.rs: comments above the spawned actor_stopped task

Rewrite as complete sentences separated by periods.

16. rivetkit-wasm/src/lib.rs missing #[cfg(target_arch = "wasm32")] on start_run_handler

start_run_handler calls wasm_bindgen_futures::spawn_local without a conditional compilation guard. If compiled for a non-wasm32 target (e.g., integration test harness), this will fail to compile.


Minor Observations

  • AsyncCounter in envoy-client/src/async_counter.rs correctly arms Notified before checking the count, consistent with the CLAUDE.md performance rule. Good implementation.
  • endpoints_match now silently splits on commas — this is a behavior change for URLs containing commas. Add a unit test for the comma-splitting path.
  • bytes_response uses let _ = tx.try_send(...) which silently drops send failures. A tracing::debug! on failure would be more defensive.
  • The GET /start alias in serverless.rs route matching has no comment explaining which platform requires it.
  • The RuntimeSpawner not(any(native-runtime, wasm-runtime)) fallback in rivetkit-core/src/runtime.rs is identical to the native-runtime copy — a comment explaining when this fires would prevent confusion.

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 716eee0 to 8da73a1 Compare April 30, 2026 20:35
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Preview packages published to npm

Install with:

npm install rivetkit@pr-4860

All packages published as 0.0.0-pr.4860.2a8ccdf with tag pr-4860.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-2a8ccdf
docker pull rivetdev/engine:full-2a8ccdf
Individual packages
npm install rivetkit@pr-4860
npm install @rivetkit/react@pr-4860
npm install @rivetkit/rivetkit-napi@pr-4860
npm install @rivetkit/workflow-engine@pr-4860

@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch 2 times, most recently from 06acbb1 to 6213c3a Compare May 2, 2026 00:13
@NathanFlurry NathanFlurry force-pushed the 04-29-chore_rivetkit_wasm_support branch from 6213c3a to 4966b21 Compare May 2, 2026 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant